Association

An association (dependence) exists between two variables if a particular value/category for one variable is more likely to occur with certain values/categories of the other variable.

Response and Explanatory variables:

  • Response variable: The dependent variable, it is the outcome variable on which we are making comparisons.
  • Explanatory variable: The indipendent variable, we it compared with respect to the values/categories on the response variable

Condingency table

Useful for looking at the associations between two categorical variables.

  • It displays two categorical variables
  • The rows list the categories of one variable
  • The columns list the categories of the other variable
  • Entries in the table are frequencies

The original table was:

Conditional proportions or percentages

Let's have an example and assume that:

  • OS is the response.
  • Gen is the explanatory variable.
  • We watch the distribution of OS change as Gen changes.

Looking at this table:

  • 0.83 = proportion of YES under the condition Gen=M (conditional prop.)
  • 0.25 = proportion of YES under the condition Gen=F (conditional prop.)
  • 0.80 = proportion of YES (marginal proportion)

We can see that men are more likely to say YES.

If there is no association between OS and Gen, then the conditional proportions for the response variable categories (OS) would be the same for each gender, like this:

In this case the two variables are said to be indipendent.

Another example:

And another one:

And another one:

Hint

In these examples, for each category of the response we find under which category of the explanatory variable its percentage is greater than the corresponding marginal.

Association of quantitative variables, Scatterplot

  • Horizontal Axis: Explanatory variable, x.
  • Vertical Axis: Response variable, y.

Interpreting the scatterplot:

The variables are:

  • Positively associated when
    • High values of x tend to occur with high values of y

    • Low values of x tend to occur with low values of y

  • Negatively associated when
    • high values of one variable tend to pair with low values of the other variable

    • High values of x tend to occur with low values of y

    • Low values of x tend to occur with high values of y

  • Not associated:

The strength of the association can be measured through the correlation coefficient.